28 research outputs found

    Feature selection and intelligent livestock management

    Get PDF
    Computational animal breeding relies on genetic-statistical models that are aimed at estimating breeding values, which in turn are used to rank animals based on their genetic potential with respect to certain traits of interest (e.g. size, milk yield). However, modern livestock production systems collect large amounts of data throughout the life of an animal that are not directly suited for those statistical models, such as periodic phenotype values (physical traits such as weight measurements) and environmental observations. In this thesis, we explore the potential of using that additional data to improve future phenotype prediction in livestock using machine learning methods

    A framework for feature selection through boosting

    Get PDF
    As dimensions of datasets in predictive modelling continue to grow, feature selection becomes increasingly practical. Datasets with complex feature interactions and high levels of redundancy still present a challenge to existing feature selection methods. We propose a novel framework for feature selection that relies on boosting, or sample re-weighting, to select sets of informative features in classification problems. The method uses as its basis the feature rankings derived from fast and scalable tree-boosting models, such as XGBoost. We compare the proposed method to standard feature selection algorithms on 9 benchmark datasets. We show that the proposed approach reaches higher accuracies with fewer features on most of the tested datasets, and that the selected features have lower redundancy

    Segmentation in large-scale cellular electron microscopy with deep learning: A literature survey

    Get PDF
    Electron microscopy (EM) enables high-resolution imaging of tissues and cells based on 2D and 3D imaging techniques. Due to the laborious and time-consuming nature of manual segmentation of large-scale EM datasets, automated segmentation approaches are crucial. This review focuses on the progress of deep learning-based segmentation techniques in large-scale cellular EM throughout the last six years, during which significant progress has been made in both semantic and instance segmentation. A detailed account is given for the key datasets that contributed to the proliferation of deep learning in 2D and 3D EM segmentation. The review covers supervised, unsupervised, and self-supervised learning methods and examines how these algorithms were adapted to the task of segmenting cellular and sub-cellular structures in EM images. The special challenges posed by such images, like heterogeneity and spatial complexity, and the network architectures that overcame some of them are described. Moreover, an overview of the evaluation measures used to benchmark EM datasets in various segmentation tasks is provided. Finally, an outlook of current trends and future prospects of EM segmentation is given, especially with large-scale models and unlabeled images to learn generic features across EM datasets

    Pre-insemination prediction of dystocia in dairy cattle

    Get PDF
    Dystocia or difficult calving in cattle is detrimental to the health of the afflicted cows and has a negative economic impact on the dairy industry. The goal of this study was to create a data-driven tool for predicting the calving difficulty of non-heifer cows using input variables that are known prior to the moment of insemination. Compared to past studies, we excluded input variables that can only be known during or after insemination, such as birth weight and gestation length. This makes the model suitable for informing mating decisions that could reduce the incidence of difficult calvings or mitigate their consequences. We used a dataset consisting of 131,527 calving records of Holstein cattle, from which we derived a total of 274 phenotypic features and estimated breeding values. The distribution of classes in the dataset was 96.7 % normal calvings, and 3.3 % difficult calvings. We used a gradient boosted trees (XGBoost) as the learning model and a bagging ensemble approach to deal with the extreme class imbalance. The model achieved an average area under the ROC curve of 0.73 on unseen test data. Using feature importance analysis, we identified a number of features that have a high discriminatory value for calving difficulty, including maternal and paternal breeding values, and past phenotypic measurements of the cow

    fseval: A Benchmarking Framework for Feature Selection and Feature Ranking Algorithms

    Get PDF
    The fseval Python package allows benchmarking Feature Selection and Feature Ranking algorithms on a large scale, and facilitates the comparison of multiple algorithms in a systematic way. In particular, fseval enables users to run experiments in parallel and distributed over multiple machines, and export the results to an SQL database. The execution of an experiment can be fully determined by a configuration file, which means the experiment results can be reproduced easily, given only the configuration file. fseval has high test coverage, continuous integration, and rich documentation. The package is open source and can be installed through PyPI. The source code is available at: https://github.com/dunnkers/fseval
    corecore